Ontology-based Crawler for the Semantic Web Dissertation for the degree of Master in Applied Computer Science
نویسندگان
چکیده
Information interoperatbility has received increased attention since the growing popularity of the Internet, the Web and distributed computing infrastructures. During this evolution, the attention on semantics and ontologies to achieve this interoperatbility has also increased. The same thing is happening on the Semantic Web, where ontologies are used to assign (agreed) meaning to the content of the Web. On the Semantic Web, data will inevitably be linked to many different ontologies, and information processing across ontologies is not possible without knowing the semantic mappings between them. As the resources on the Semantic Web are annotated using these ontologies, new search techniques can be applied to find specific and structured information. This evolution calls for new tools to assist the user in the discovery and extraction of specific information resources from the Semantic Web. In this thesis, we describe an ontologyfocused crawler for the Semantic Web. We argue that this crawler can exploit the semantic meta data to efficiently discover and extract information resources on the Semantic Web. In our approach, we use a topic ontology to guide the crawler to the relevant information. We present DMatch-lite as the algorithm used to guide the crawler. DMatch-lite is an automated ontology matching algorithm that matches the topic ontology with the Semantic Web ontologies discovered during the crawl. It computes the similarity score between both ontologies and returns a list of candidate mappings. These candidate mappings can later facilitate the integration of the extracted ontologies. By computing the similarity between a topic ontology and Semantic Web ontologies instead of comparing a keyword description to the text of the resources, we believe that the crawler is more efficient in the discovery and extraction of information resources. We embed this crawler system into the DOGMA Framework. The DOGMA Studio workbench provides the ontology engineer with a powerful framework to model the topic ontology commitment used to guide the crawler. By embedding the crawer into the DOGMA Framework, the data extracted during the crawl can also be efficiently used in other applications such as ontology integration and elicitation. Samenvatting Met de toenemende populariteit van Internet en het Web is de nood aan onderlinge informatie overdracht alleen maar toegenomen. Tijdens die evolutie is ook de aandacht voor semantiek en ontologieën sterk gestegen. Een gelijkaardige evolutie is vast te stellen op het Semantische Web. Daar worden ontologieën gebruikt om (overeengekomen) betekenis aan de inhoud van het web toe te wijzen. Zo worden de gegevens binnen het Semantische Web onvermijdelijk verbonden met de diverse ontologieën en is de informatieverwerking van deze gegevens niet mogelijk zonder de semantische relaties tussen de verschillende ontologieën te kennen. Door de annotatie van gegevens met diverse ontologieën, kunnen nieuwe zoek algoritmes ontwikkeld worden om specifieke en gestructureerde informatie te vinden. In deze thesis beschrijven we een ontology-based crawler voor het Semantische Web. We tonen aan dat deze crawler de semantisch geannoteerde data kan exploiteren omde gegevens op het Semantische Web op een efficiënte manier te ontdekken en op te vragen. In onze benadering gebruiken we een topic-ontology om de crawler naar de relevante informatie te leiden. Wij stellen het DMatch-lite algoritme voor om de semantisch geannoteerde data te exploiteren. Dmatch-lite is een geautomatiseerd ontology-based matching algoritme dat de bron ontologiemet de ontologieën die op het SemantischeWeb gevondenworden, vergelijkt. Het berekent de gelijkenis tussen beide ontologieën en geeft een lijst terug van alle kandidaat relaties tussen de concepten van de ontologieën. In plaats van een gewone keyword beschrijing te gebruiken om de crawler naar de relevante paginas te leiden, geloven wij dat onze aanpak, met een bron ontologie en DMatch-lite als ontology matcher een betere oplossing biedt. Deze crawler is ontwikkeld binnen het DOGMA Framework. We gebruiken de DOGMA Studio Workbench om de bron ontologie te modelleren. Bovendien kunnen, door gebruik te maken van het DOGMA Framework, de gegevens die tijdens het crawler verzameld werden, gebruikt worden in andere toepassingen, zoals de integratie en elicitatie van ontologieën.
منابع مشابه
Query Architecture Expansion in Web Using Fuzzy Multi Domain Ontology
Due to the increasing web, there are many challenges to establish a general framework for data mining and retrieving structured data from the Web. Creating an ontology is a step towards solving this problem. The ontology raises the main entity and the concept of any data in data mining. In this paper, we tried to propose a method for applying the "meaning" of the search system, But the problem ...
متن کاملAn Executive Approach Based On the Production of Fuzzy Ontology Using the Semantic Web Rule Language Method (SWRL)
Today, the need to deal with ambiguous information in semantic web languages is increasing. Ontology is an important part of the W3C standards for the semantic web, used to define a conceptual standard vocabulary for the exchange of data between systems, the provision of reusable databases, and the facilitation of collaboration across multiple systems. However, classical ontology is not enough ...
متن کاملPrioritize the ordering of URL queue in Focused crawler
The enormous growth of the World Wide Web in recent years has made it necessary to perform resource discovery efficiently. For a crawler it is not an simple task to download the domain specific web pages. This unfocused approach often shows undesired results. Therefore, several new ideas have been proposed, among them a key technique is focused crawling which is able to crawl particular topical...
متن کاملCentralized Clustering Method To Increase Accuracy In Ontology Matching Systems
Ontology is the main infrastructure of the Semantic Web which provides facilities for integration, searching and sharing of information on the web. Development of ontologies as the basis of semantic web and their heterogeneities have led to the existence of ontology matching. By emerging large-scale ontologies in real domain, the ontology matching systems faced with some problem like memory con...
متن کاملAn Improved Semantic Schema Matching Approach
Schema matching is a critical step in many applications, such as data warehouse loading, Online Analytical Process (OLAP), Data mining, semantic web [2] and schema integration. This task is defined for finding the semantic correspondences between elements of two schemas. Recently, schema matching has found considerable interest in both research and practice. In this paper, we present a new impr...
متن کامل